Client Report - What’s in a Name?

Course DS 250

Author

Dallin Moak

Show the code
import pandas as pd
import numpy as np
from lets_plot import *

LetsPlot.setup_html(isolated_frame=True)

Project Notes

For Project 1 the answer to each question should include a chart and a written response. The years labels on your charts should not include a comma. At least two of your charts must include reference marks.

Show the code
import os

os.getcwd()
from p1_source import my_name_plot_data

# my_name_plot_data

Source code

source code available at p1_source.py

QUESTION|TASK 1

How does your name at your birth year compare to its use historically?

my name, “Dallin” occured most about 3 years after I was born, but the name had been trending up from obscurity for almost 8 years before my birth in 1996.

Show the code
from p1_source import my_name_plot

my_name_plot

QUESTION|TASK 2

If you talked to someone named Brittany on the phone, what is your guess of his or her age? What ages would you not guess?

The dataset indicates that Britanys of age 35 (in 2025) are most common.There are very few people by that name older than 50, and only a few younger than 19.

Show the code
from p1_source import brittany_plot

brittany_plot

QUESTION|TASK 3

Mary, Martha, Peter, and Paul are all Christian names. From 1920 - 2000, compare the name usage of each of the four names in a single chart. What trends do you notice?

all names have a dip starting after about 1925, and a massive boom during the war years, with a peak after the end of the war.

Show the code
from p1_source import q3_data_plot
q3_data_plot

this graph attempts to normalize the data relative to their frequency in 1920. they seem to follow a really similar pattern, but with peter being the most strongly affected

Show the code
from p1_source import q3_data_plot_alt
q3_data_plot_alt

QUESTION|TASK 4

Think of a unique name from a famous movie. Plot the usage of that name and see how changes line up with the movie release. Does it look like the movie had an effect on usage?

Neo from the Matrix (1999) has no occurances before 1999. This corrolation might indicate a causal relationship between the release of the movie and the occurance of the name. There seems to be no signficant decrease in the name’s frequency in the 1.5 decades of data since the movie’s release. Maybe matrix sequel movies had an effect. But both of the two sequels were released in ’03 (excepting the 2021 reboot or whatever it was). The name seems relatively uncommon but I didn’t do a comparison to aggregate frequencies for other names in the set

Show the code
# Include and execute your code here
from p1_source import movie_name_plot, movie_name_data

movie_name_plot
Show the code
movie_name_data
name year Total
287684 Neo 1999 7.0
287685 Neo 2000 50.0
287686 Neo 2001 64.0
287687 Neo 2002 25.0
287688 Neo 2003 51.0
287689 Neo 2004 67.0
287690 Neo 2005 46.0
287691 Neo 2006 70.0
287692 Neo 2007 43.0
287693 Neo 2008 30.0
287694 Neo 2009 46.0
287695 Neo 2010 30.0
287696 Neo 2011 28.0
287697 Neo 2012 29.0
287698 Neo 2013 26.0
287699 Neo 2014 52.0
287700 Neo 2015 38.0

STRETCH QUESTION|TASK 1

Reproduce the chart Elliot using the data from the names_year.csv file.

I created the chart and used some ggplot tooling to attempt to adjust to look & feel of the chart to match the example. I couldn’t figure out labels for vertical lines. the geom_vline() doesn’t seem to support labels and i was having trouble with geom_text() causing an entire render failure

Show the code
# Include and execute your code here
from p1_source import elliot_data, elliot_plot

elliot_data
name year Total
118687 Elliot 1911.0 5.0
118688 Elliot 1912.0 6.0
118689 Elliot 1913.0 11.0
118690 Elliot 1914.0 24.0
118691 Elliot 1915.0 22.0
... ... ... ...
118787 Elliot 2011.0 891.5
118788 Elliot 2012.0 1042.5
118789 Elliot 2013.0 1064.5
118790 Elliot 2014.0 1199.0
118791 Elliot 2015.0 1250.0

105 rows × 3 columns

Show the code
elliot_plot